Improving cluster-based missing value estimation of DNA microarray data.

نویسندگان

  • Lígia P Brás
  • José C Menezes
چکیده

We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values. The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation. The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Missing value estimation methods for DNA microarrays

MOTIVATION Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values....

متن کامل

Missing Value Estimation for DNA Microarray Expression Data: Least Squares Imputation

Motivation: Gene expression microarray data sets often contain missing expression values. Robust missing value estimation methods are needed since many algorithms for gene expression analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares and cluster structure are proposed to estimate missing values in the gene expression data, which...

متن کامل

Improving missing value estimation in microarray data with gene ontology

MOTIVATION Gene expression microarray experiments produce datasets with frequent missing expression values. Accurate estimation of missing values is an important prerequisite for efficient data analysis as many statistical and machine learning techniques either require a complete dataset or their results are significantly dependent on the quality of such estimates. A limitation of the existing ...

متن کامل

A Novel Distance Based Modified K-means Clustering Algorithm for Estimation of Missing Values in Micro-array Gene Expression Data

Microarray experiments normally produce data sets with multiple missing expression values, due to various experimental problems. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene expression values as input. Therefore, effective missing value estimation methods are needed to minimize the effect of incomplete data during analysis of gene expression data...

متن کامل

An Improved Fixed Rank Approximation Algorithm for Missing Value Estimation for DNA Microarray Data

Gene expression data matriices often contain missing expression values. In this paper, we describe an improved fixed rank approximation algorithm (IFRAA) and compare it to the three recent methods for reconstructing missing entries for DNA microarray gene expression data: the Bayesian principal component analysis (BPCA), the fixed rank approximation algorithm (FRAA) and the local least squares ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Biomolecular engineering

دوره 24 2  شماره 

صفحات  -

تاریخ انتشار 2007